Skip to content

Add docs for IOI#948

Merged
Kipok merged 5 commits intomainfrom
feat/ioi_docs
Oct 23, 2025
Merged

Add docs for IOI#948
Kipok merged 5 commits intomainfrom
feat/ioi_docs

Conversation

@SeanNaren
Copy link
Collaborator

@SeanNaren SeanNaren commented Oct 15, 2025

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive IOI evaluation workflow covering IOI24/IOI25 contexts.
    • Step-by-step data preparation, running evaluation, and results verification guidance.
    • Example evaluation command set to 50 solutions per sub-task and notes for cluster/server use.
    • Replaced older IOI subsection with a broader IOI-focused workflow.
    • Removed an explicit benchmark-declaration line from the human-eval-infilling section.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

Walkthrough

Replaced the ioi24 subsection with a single IOI section documenting IOI24/IOI25: dataset preparation via ns prepare_data, ns eval usage with Slurm/local options and multi-solution settings (e.g., 50 solutions per subtask), and result verification. Removed a benchmark-definition line from human-eval-infilling.

Changes

Cohort / File(s) Change summary
Documentation — IOI evaluation workflow
docs/evaluation/code.md
Replaced the ioi24-specific subsection with a unified IOI section covering IOI24/IOI25: added data preparation steps (`ns prepare_data ioi24
Documentation — human-eval-infilling tweak
docs/evaluation/code.md
Removed the explicit "Benchmark is defined" declaration line in the human-eval-infilling subsection, leaving only the original benchmark source link.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant NS as ns CLI
  participant DS as Dataset Store
  participant SL as Slurm Scheduler
  participant EV as Evaluator
  participant RS as Results/Logs

  rect rgb(235,245,255)
    note over U,NS: Data preparation (IOI24/IOI25)
    U->>NS: ns prepare_data --benchmark ioi24|ioi25 ...
    NS->>DS: fetch & prepare IOI artifacts
    DS-->>NS: prepared dataset path
    NS-->>U: prints prepared-data path
  end

  rect rgb(240,255,240)
    note over U,NS: Evaluation (multi-solution)
    U->>NS: ns eval --benchmark ioi24|ioi25 --num-solutions-per-subtask 50 --slurm|--local ...
    alt Slurm
      NS->>SL: submit evaluation jobs
      SL->>EV: start evaluator tasks
    else Local
      NS->>EV: run evaluator locally
    end
    EV->>DS: load prepared data
    EV->>EV: generate N solutions per subtask
    EV->>RS: write metrics, logs, artifacts
    RS-->>U: results path for verification
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibble docs with nimble paws,
IOI24 and IOI25 join the cause,
Prepare the data, then eval the run,
Fifty solutions until scoring's done,
Carrots tallied, results delight — hop on! 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Add docs for IOI" is concise, clear, and directly related to the main change in the pull request. The changeset primarily focuses on adding and expanding documentation for IOI (both IOI24 and IOI25 support) in the evaluation code documentation file, which matches exactly what the title conveys. The title is specific enough that a developer scanning the commit history would immediately understand that this PR introduces IOI-related documentation, without being overly verbose or generic.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/ioi_docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/evaluation/code.md (2)

378-405: Add shell language hints to command snippets.

Please tag these fenced blocks as bash (or shell) so rendered docs get syntax highlighting and downstream linters stop flagging them.


371-372: Use descriptive link text.

Replace bare “here” with something like “IOI24 dataset on HuggingFace” to satisfy MD059 and improve accessibility.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea29b1b and f73384b.

📒 Files selected for processing (1)
  • docs/evaluation/code.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md

367-367: Link text should be descriptive

(MD059, descriptive-link-text)


371-371: Link text should be descriptive

(MD059, descriptive-link-text)


377-377: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


389-389: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pre-commit
  • GitHub Check: unit-tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
docs/evaluation/code.md (2)

371-372: Use descriptive link text for accessibility.

Replace the bare “here” link text with something meaningful like “IOI24 dataset on Hugging Face” so screen readers convey the destination. Based on static analysis hints


377-405: Annotate shell snippets with their language.

Please add a language like bash to the fenced code blocks so tooling and syntax highlighting work correctly. Based on static analysis hints

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f73384b and 8be70d9.

📒 Files selected for processing (1)
  • docs/evaluation/code.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md

367-367: Link text should be descriptive

(MD059, descriptive-link-text)


371-371: Link text should be descriptive

(MD059, descriptive-link-text)


377-377: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


389-389: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
docs/evaluation/code.md (3)

371-372: Use descriptive link text for accessibility.

Replace “here” with meaningful link text so screen readers convey where the URL leads.

-- Original benchmark source is [here](https://huggingface.co/datasets/open-r1/ioi).
+- Original benchmark source is the [Open-R1 IOI dataset on Hugging Face](https://huggingface.co/datasets/open-r1/ioi).

377-379: Add a language identifier to the CLI code fence.

Specify the shell language for proper syntax highlighting and lint compliance.

-```
+```bash
 ns prepare_data ioi24

---

`389-405`: **Add the shell language to the eval command fence.**

Mark the fence as bash to improve readability and satisfy markdown linting.

```diff
-```
+```bash
 ns eval \
     --cluster=<CLUSTER_NAME> \
     --model=nvidia/OpenReasoning-Nemotron-32B \
     --server_type=vllm \
     --server_args="--async-scheduling" \
     --server_nodes=1 \
     --server_gpus=8 \
     --benchmarks=ioi24:50 \
     --with_sandbox \
     --split=test \
     --data_dir=<DATA_DIR> \
     --output_dir=<OUTPUT_DIR> \
     --extra_eval_args="++eval_config.test_file=<PATH_TO_METADATA_TEST_FILE>" \
     ++inference.temperature=0.6 \
     ++inference.top_p=0.95 \
     ++inference.tokens_to_generate=65536

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 8be70d991c74f44e1235ca5a96891df490dba36b and 04a6eede4e33a4f7291fb4d0c3f1ed092326d07d.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `docs/evaluation/code.md` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>docs/evaluation/code.md</summary>

367-367: Link text should be descriptive

(MD059, descriptive-link-text)

---

371-371: Link text should be descriptive

(MD059, descriptive-link-text)

---

377-377: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

389-389: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)</summary>

* GitHub Check: pre-commit
* GitHub Check: unit-tests

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
docs/evaluation/code.md (4)

181-184: Add code block language specifiers and use descriptive link text.

The IOI section has markdown linting violations. The code block starting at line 189 is missing a language specifier, and link text at line 183 should be descriptive rather than "here".

Apply this diff:

 ### IOI

-We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen [here](https://huggingface.co/datasets/open-r1/ioi).
+We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen [in the open-r1/ioi dataset](https://huggingface.co/datasets/open-r1/ioi).

189-191: Specify code block language.

The fenced code block should declare a language for syntax highlighting.

Apply this diff:

 First, prepare the dataset by running the `ns prepare_data` command. The arguments below will generate `test.jsonl` and `test_metadata.json`.

-```
+```bash
 ns prepare_data ioi24
-```
+```

201-218: Specify code block language for the evaluation command.

The fenced code block should declare a language for syntax highlighting.

Apply this diff:

 This command runs an evaluation of [OpenReasoning-Nemotron-32B](https://huggingface.co/nvidia/OpenReasoning-Nemotron-32B) on a Slurm cluster.

-```
+```bash
 ns eval \
     --cluster=<CLUSTER_NAME> \

224-229: Specify code block language for the results output.

The fenced code block showing expected output should declare a language for consistency.

Apply this diff:

 After all jobs are complete, you can check the results in `<OUTPUT_DIR>/eval-results/ioi24/metrics.json`. You can also take a look at `<OUTPUT_DIR>/eval-results/ioi24/summarized-results/main_*`. They should look something like this:

-```
+```text
 ------------------------------------------------------ ioi24 ------------------------------------------------------
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04a6eed and 55306fc.

📒 Files selected for processing (1)
  • docs/evaluation/code.md (2 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md

183-183: Link text should be descriptive

(MD059, descriptive-link-text)


189-189: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


201-201: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


224-224: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


414-414: Link text should be descriptive

(MD059, descriptive-link-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests
🔇 Additional comments (1)
docs/evaluation/code.md (1)

411-414: Verify intentional removal of benchmark definition line in human-eval-infilling.

The human-eval-infilling section now differs structurally from other benchmarks: the "Benchmark is defined in..." line was removed, leaving only the original benchmark source link. Confirm this removal was intentional and aligns with documentation standards.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
docs/evaluation/code.md (4)

180-190: Add language specifier to code block and use descriptive link text.

Per markdown best practices, code blocks should specify the language and links should use descriptive text instead of "here".

-We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen [here](https://huggingface.co/datasets/open-r1/ioi).
+We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen in the [open-r1/ioi dataset](https://huggingface.co/datasets/open-r1/ioi).

 #### Data Preparation

 First, prepare the dataset by running the `ns prepare_data` command. The arguments below will generate `test.jsonl` and `test_metadata.json`.

-```
+```bash
 ns prepare_data ioi24

---

`200-217`: **Add language specifier to ns eval command block.**

Specify `bash` as the language for the code block to improve formatting and readability.

```diff
-```
+```bash
 ns eval \
     --cluster=<CLUSTER_NAME> \

223-228: Add language specifier to results output block.

Specify the language for the code block to maintain consistency with other sections.

-```
+```
 ------------------------------------------------------ ioi24 ------------------------------------------------------
 evaluation_mode   | num_entries | avg_tokens | gen_seconds | correct       | total_score        | round_robin_score

410-413: Use descriptive link text instead of "here".

Replace the generic "here" with text that describes the link destination.

-
-- Original benchmark source is [here](https://github.com/openai/human-eval-infilling).
+
+- Original benchmark source is at [openai/human-eval-infilling](https://github.com/openai/human-eval-infilling).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55306fc and 0c447fa.

📒 Files selected for processing (1)
  • docs/evaluation/code.md (2 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md

182-182: Link text should be descriptive

(MD059, descriptive-link-text)


188-188: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


200-200: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


223-223: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


413-413: Link text should be descriptive

(MD059, descriptive-link-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests
🔇 Additional comments (1)
docs/evaluation/code.md (1)

180-229: Excellent IOI documentation additions.

The new IOI section is well-structured and comprehensive, covering data preparation, evaluation with multi-solution settings, and results verification. It directly addresses previous feedback about including expected output examples. The typo fix (METADATA_TEST_FILE) from the earlier review has been properly applied. The section follows established patterns from other benchmarks and provides clear, actionable instructions for users.

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Sean Naren <sean.narenthiran@gmail.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
docs/evaluation/code.md (5)

182-182: Make link text descriptive.

Use descriptive, meaningful text that clearly indicates the link destination. Avoid generic phrases that provide no context about where the link leads. Replace the generic "here" with text that describes the link target, e.g., "IOI24 dataset on Hugging Face" or similar.

-The original data for IOI24 can be seen [here](https://huggingface.co/datasets/open-r1/ioi).
+The original data for IOI24 can be seen in the [open-r1 IOI dataset](https://huggingface.co/datasets/open-r1/ioi).

188-190: Add language identifier to code block.

Specify bash as the language for syntax highlighting.

-```
+```bash
 ns prepare_data ioi24
-```
+```

200-217: Add language identifier to code block.

Specify bash as the language for syntax highlighting.

-```
+```bash
 ns eval \
     --cluster=<CLUSTER_NAME> \
     --model=nvidia/OpenReasoning-Nemotron-32B \
-```
+```

223-228: Add language identifier to code block.

Specify a language (e.g., text or plain) for the example output block.

-```
+```text
 ------------------------------------------------------ ioi24 ------------------------------------------------------
 evaluation_mode   | num_entries | avg_tokens | gen_seconds | correct       | total_score        | round_robin_score
 pass@1[avg-of-50] | 39          | 40387      | 7410        | 0.51% ± 1.04% | 303.47             | 261.01
 pass@50           | 39          | 40387      | 7410        | 2.56%         | 303.47             | 261.01
-```
+```

413-413: Make link text descriptive.

Use descriptive, meaningful text that clearly indicates the link destination. Avoid generic phrases that provide no context about where the link leads. Replace the generic "here" with text describing the benchmark source, e.g., "human-eval-infilling repository" or similar.

-- Original benchmark source is [here](https://github.com/openai/human-eval-infilling).
+- Original benchmark source is the [human-eval-infilling repository](https://github.com/openai/human-eval-infilling).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c447fa and ca6d3d8.

📒 Files selected for processing (1)
  • docs/evaluation/code.md (2 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md

182-182: Link text should be descriptive

(MD059, descriptive-link-text)


188-188: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


200-200: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


223-223: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


413-413: Link text should be descriptive

(MD059, descriptive-link-text)

🔇 Additional comments (1)
docs/evaluation/code.md (1)

180-229: Comprehensive IOI documentation with clear workflow.

The section provides a complete walkthrough: data preparation, evaluation execution with a realistic example, and result verification including expected output. This addresses prior feedback effectively and gives users clear guidance on IOI24/IOI25 evaluation.

@Kipok Kipok merged commit c8252d5 into main Oct 23, 2025
6 checks passed
@Kipok Kipok deleted the feat/ioi_docs branch October 23, 2025 19:23
dgtm777 pushed a commit that referenced this pull request Oct 29, 2025
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Sean Naren <sean.narenthiran@gmail.com>
@coderabbitai coderabbitai bot mentioned this pull request Dec 4, 2025
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Sean Naren <sean.narenthiran@gmail.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants